Mining linguistic tone patterns with symbolic representation
نویسنده
چکیده
This paper conceptualizes speech prosody data mining and its potential application in data-driven phonology/phonetics research. We first conceptualize Speech Prosody Mining (SPM) in a time-series data mining framework. Specifically, we propose using efficient symbolic representations for speech prosody time-series similarity computation. We experiment with both symbolic and numeric representations and distance measures in a series of time-series classification and clustering experiments on a dataset of Mandarin tones. Evaluation results show that symbolic representation performs comparably with other representations at a reduced cost, which enables us to efficiently mine large speech prosody corpora while opening up to possibilities of using a wide range of algorithms that require discrete valued data. We discuss the potential of SPM using time-series mining techniques in future works.
منابع مشابه
A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملSimilarity search optimization using recently-biased symbolic representation
Dimension reduction is one of the important requirements for a successful representation to improve the efficiency of extracting the attracting trend patterns on the time series. Furthermore, an efficient and accurate similarity searching on a huge time series data set is a crucial problem in data mining preprocessing. Symbolic representations have proven to be a very effective way to reduce th...
متن کاملClustering Large Symbolic Datasets
Clustering is the process of partitioning a set of labeled/unlabeled patterns into meaningful groups so that patterns in each group/cluster are similar to each other in some sense and patterns in different clusters are dissimilar in a corresponding sense. A major outcome of clustering process is an abstraction in the form of description of the clusters; this abstraction can be useful in several...
متن کاملA text representation language for contextual and distributional processing
This thesis examines distributional and contextual aspects of linguistic processing in relation to traditional symbolic approaches. Distributional processing is more commonly associated with statistical methods, while an integrated representation of context spanning document and syntactic structure is lacking in current linguistic representations. This thesis addresses both issues through a nov...
متن کاملFoundations of Data Mining and knowledge Discovery
This paper discusses a view to capture discovery as a translation from non-symbolic to symbolic representation. First, a relation between symbolic processing and non-symbolic processing is discussed. An intermediate form was introduced to represent both of them in the same framework and clarify the difference of these two. Characteristic of symbolic representation is to eliminate quantitative m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016